Concentration of measures supported on the cube 
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Abstract 

We prove a log-Sobolev inequality for a certain class of log-concave measures in high 
dimension. These are the probability measures supported on the unit cube [0, l] n C W l 
whose density takes the form exp(— ip) where the function ip is assumed to be convex 
(but not strictly convex) with bounded pure second derivatives. Our argument relies on 
a transportation-cost inequality a la Talagrand and on the notion of translation-invariant 
relative entropy. 

1 Introduction 

Consider a cube Q C W 1 of sidelength I parallel to the axes, that is, Q is a translation of 
the set (0, £) n C R n (or of its closure, equivalently). In this paper we prove a concentration 
inequality for a class of probability measures supported on Q. 

Write | • | for the standard Euclidean norm in R n and B% = {x 6 M n ;|x| < 1} is 
the Euclidean unit ball centered at the origin. For a subset A C R n denote A + eB% = 
{x + ey; x S A,y E B^ }, the e-neighborhood of the set A. 

Theorem 1.1 Let £ > 0, M > and let Q C W 1 be a cube of sidelength £ parallel to the 
axes. Let jj,be a probability measure supported on Q with density exp(—tp) for a convex 
function ip : Q — > R such that 

d u ip(x)<M forallx£Q,i = l,...,n. (1) 

Suppose that A C M. n is a measurable set with n(A) > 1/2. Then, for all t > 0, 

H(A + tB$) > 1 - exp (-t 2 /a 2 ) (2) 

where a = a(£, M) = 3£e M£2 / 8 . 

Theorem ll.ll is equivalent to a logarithmic Sobolev inequality and to a concentration in- 
equality forLipschitz functions, see Section |4]below. In probabilistic terminology, we con- 
sider uniformly bounded random variables X\, . . . , X n , possibly dependent, whose joint 
distribution satisfies the convexity/concavity assumptions of Theorem [TTTJ Our results cor- 
respond to bounds for the variance and tail distribution of f(X\, . . . ,X n ) where / is a 
Lipschitz function on R™. 

We emphasize that we are not assuming any product structure, any symmetries nor 
strict convexity for the function ip from Theorem 11.11 There is a vast body of literature 
pertaining to the case in which the measure fi is an arbitrary product measure in the cube, 
see Talagrand [21], Marton [14], Dembo and Zeitouni 0, Ledoux 1131 and others. When 
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the function if) from Theorem 11.11 admits a uniform positive lower bound for the Hessian, 
the conclusion of Theorem ll.ll is well-known and essentially goes back to Bakry and Emery 

ni. 

How can we produce probability measures satisfying the assumptions of Theorem 11.11 
with, say, M = 1? One may begin with the standard Gaussian density in M n , the function 

7n (x) = (2vr)-( n / 2 ) exp(-|x| 2 /2) (x £ R n ). 

The restriction of 7n to any cube Q C W 1 , normalized to be a probability density, surely 
satisfies the assumptions of Theorem ll.il with M = 1. Furthermore, begin with any prob- 
ability density p : R n — > [0,oo) which is log-concave (that is, the function — logp is 
convex). Consider the convolution 

f(x) = (p* ln){x) = I p(yhn(x - y)dy. 

Then / is a smooth, log-concave probability density according to the Prekopa-Leindler 
inequality. Furthermore, it is straightforward to verify that for any x 6 M n , 

(V 2 log/)(x) > -Id (3) 

in the sense of symmetric matrices, where Id is the identity matrix and V 2 log / is the 
Hessian of log /. We conclude that the probability measure on the cube Q whose density 
is proportional to the restriction of / to Q, satisfies the assumptions of Theorem 11.11 with 
M = 1. It is also possible to view the probability densities that appear in Theorem II. li as 
convex perturbations of probability densities proportional to x \- > exp(x • v) on the cube. 
Here x • v is the standard scalar product ofx,v 6 R n . 

One cannot replace a(£,M) in Theorem II. II by a dimension-free expression that is 
subexponential in Ml 2 , see Remark FOI below. We say that a vector x G R n is proportional 
to one of the standard unit vectors when it has at most one non-zero entry. A unit cube has 
sidelength one. Theorem [Tj] will be deduced from the following result: 

Theorem 1.2 Let R > 1 and let Q C W 1 be a unit cube parallel to the axes. Let \xbe a 
probability measure supported on Q with a log-concave density f such that 

f (Ax + (1 - X)y) < R [Xf(x) + (1 - X)f(y)} (4) 

for any < A < 1 and any x, y £ Qfor which x — y is proportional to one of the standard 
unit vectors. Suppose that A C W 1 is a measurable set with p{A) > 1/2. Then for all 
t > 0, 

H(A + tB%) > 1 - exp {-t 2 /a 2 ) 

where a = a(R) = 2>R. 

The inequality © holds true with R = 1 when / is a convex function. By degenerating 
Theorem 1 1.21 to the petty case where R = 1 we arrive at the following peculiar corollary: 

Corollary 1.3 Let Q C M. n be a unit cube. Let [ibe a probability measure on Q whose 
density is both log-concave and convex in Q. Then for any measurable A C M. n and t > 0, 

fj,(A) > 1/2 =>- fi (A + tB%) > 1 - exp (-t 2 /9) . 
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A moment of reflection reveals that there do exist quite a few positive, integrable func- 
tions on the cube that are simultaneously log-concave and convex, such as x \-t [b + (x ■ v)] p 
for p > 1. It is also evident that one cannot eliminate neither the log-concavity assumption 
nor the convexity assumption from Corollary 11.31 

The proof of Theorem 11.21 utilizes the Knothe map from lfT2l in order to analyze the 
deficit in the Prekopa-Leindler inequality, an idea proposed also in Eldan and Klartag |7). 
Rather than working directly with the supremum-convolution, we prefer to analyze an ex- 
pression that somewhat resembles the relative-entropy functional, but is also invariant un- 
der translations. Let us shed some light on this expression. Suppose that / and g are 
non-negative functions defined on W 1 . For a point x G W 1 in which / is positive and 
differentiable, and for a point y e R n in which g is positive, we set 

S y {g, f}(x) = f(x) log " V /M -(y-x). (5) 

We denote Supp(f) = {x; f(x) / 0}. For a point x € Supp(f) in which / is differen- 
tiable, put 

S{g,f}{x) = sup S y {g,f}{x). (6) 

yeSupp(g) 

An equivalent expression for S{g, f}(x) may be described as follows: Denote ip = — log / 
and ip = — log g. Then, 

S{g, f}(x) = [Vf(x) ■ x - f(x) log f(x)} + f{x)<p*(yi>{x)) (7) 

where ip* (v) = sup y&Supp ^ [v ■ y — ^>{y)] is the usual Legendre transform of <p. 

Definition 1.4 Let f,g : R n — > [0, oo) have finite, positive integrals. Assume that f is 
differentiable almost-everywhere in Supp(f). Set 



Tire(g\\f)= [ S{g, f}(x)dx - ( [ f] log 

JSupp(f) \JR» J 



whenever the integral ofS{g, /} is well-defined. Here, Tire is an acronym of "Translation- 
Invariant Relative Entropy ". 

It is certainly possible to consider Tire(g \ \ f) for non-negative functions defined only on a 
subset of W 1 by treating such functions as zero outside their original domain of definition. 
The notion is indeed translation-invariant: For functions /, g as in Definition 11.41 and for 
xo € R n , denoting T XQ (g)(x) = g(x — xq) we have 

Tire(g \\ f) - Tire(T Xo (g)\\f) = [ (Vf(x) ■ x ) dx = 0, 



where we assume that / is locally-Lipschitz and vanishes at infinity in order to justify the 
integration by parts. Our original motivation for Definition ll.4l is that at least in the smooth, 
log-concave case, S{g,f}(x) equals dh £ (x) /de\ £=Q where 

h £ {x) = sup f{x + eyf- e g{x - (1 - e)yf (x G M n ). 



In other words, Tire(g \ \ f) may be viewed as a kind of "mixed volume" of log-concave 
functions, see iPTOl Section 3] for further explanations. It is also possible to think about 
Tire(g \ \ /), for a log-concave function /, as a parameter measuring the proximity of g to a 
translate of /. See also Remark 1231 below. Neither of these interpretations is used directly 
in the formal derivation below. 
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The remainder of this paper is devoted to the proofs of the aforementioned theorems 
and to related results. We write A for the closure of the set A, and log stands for the natural 
logarithm. By "measurable" we always mean Borel-measurable. 

Acknowledgements. I thank Ronen Eldan for interesting, related discussions. The re- 
search was supported in part by the Israel Science Foundation and by a Marie Curie Rein- 
tegration Grant from the Commission of the European Communities. 



2 Convex functions on an interval 

In addition to the definitions © and Q above, we make use of the following: For functions 

/, g : R n — > [0, oo) and a map T : Supp(f) — > Supp(g) set 

Sr{g, f}(x) = S T{x) {g, f}(x) (x G Supp(f)), 

assuming that the right-hand side is well-defined. Observe that Sx{g, /} < S{g, /} point- 
wise. Let /, J C E be two intervals of finite, positive length and let /, g be positive, inte- 
grate functions defined on /, J respectively. The monotone transportation map between f 
and g is the map T : / — > J defined via 

(/V) / /(t)1{ ' <x}dt = (77) Jj s^h^nx^dt {x g /) 

where l{t<x} equals one when t < x and vanishes otherwise. The map T is uniquely 
defined, as /, g are positive and integrable. Furthermore, T is an absolutely-continuous, 
strictly-increasing function. Note that for almost every x G /, 

\JlfJ 9{T(x)) 

We will frequently encounter the case where I = J. Clearly, in this case T(x) = x for 
x G dl, where dl are the two endpoints of the interval I. Our goal in this section is to 
prove the following transportation-cost inequality in one dimension: 

Proposition 2.1 Let R > 1 and let I C R be an interval of length one. Let /:/—)■ (0, 00) 
be an absolutely-continuous function which satisfies 

/(Ax + (1 - \)y) < R [A/0) + (1 - X)f{y)\ for all x,y G 1, < A < 1. 

Let g be a positive, integrable function on I, and let T be the monotone transportation map 
between f and g. Then, 



\T(x) - x\ 2 f(x)dx < CR 2 S T {g, /} - ( jf /) log ^ 
<CR 2 -Tire(g\\f), 



Sl9_ 
f 



(10) 



where C < 40/9 is a universal constant. 



The proof of Proposition 12.11 requires a few lemmata. Our first lemma is essentially 
an infinitesimal version of the Prekopa-Leindler inequality, and its proof follows the trans- 
portation proofs given by Barthe [2], Henstock-Macbeath Q and Talagrand [20]. For t G M 
denote 

A(t) = min{|t|,t 2 }. 
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Lemma 2.2 Let I C M. be an interval of finite, positive length. Let f, g be positive, inte- 
grable functions on I with f being absolutely continuous. Then, 



jA(T'(x)-l)f(x)dx < y 



S T {g,f} 



/log 



III 
hf 



where T is the monotone transportation map between f and g. 
Proof: We use © and compute 

g(T(x)) 



S T {g,f} 



f(x) log 



f'(x)(T(x)-x) 



dx 



/].ogif + 



1 



/(*)log^y-/'(*)(T(x)-x) 



dx 



j log jj + [-/(x) logT'(x) + f(x)(T'(x) - 1)] dx 



(11) 
(12) 



where the integration by parts is legitimate as /(x)(T(x) — x) is an absolutely-continuous 
function in / that vanishes on dl. In order to complete the proof of the lemma it remains to 
show that for all x > 0, 



— logs + [x — 1) > — • min{|x — 1|, (x — l) 2 }. 
Indeed, for < x < 2 we use the Cauchy-Schwartz inequality and obtain 



(13) 



(x — 1) — log X 



t-1 



-dt 



x-l\ 



t 



1 + sgn(x — l)t 



dt > 



x-l 



l + t 



-dt 



> 



\x-l\ 



tdt 



(l + t)tdt 



(x - if 



2(l + 2|x-l|/3) 



> 



3(x - 1) 

IF - 



where sgn(x) = 1 for x > and sgn(x) = — 1 for x < 0. The inequality (fT3T > is valid in 
particular for x = 2. For x > 2 the derivative of the left-hand side in (fT3l) exceeds that of 
the right-hand side. Hence (fT3T ) holds true for all x > 0. □ 



Remark 2.3 The proof of Lemma [2721 admits a generalization to n dimensions, in which 
one utilizes the Brenier map in place of the transportation map T. See Barthe and 
McCann |[T5l for related arguments. In this way one obtains the inequality 

Tire(g\\f) > 0, (14) 

which is valid for any Lipschitz, non-negative, compactly-supported functions / and g on 
M. n with a finite positive integral. Equality is attained in (fT4l when / is the restriction of a 
log-concave function on W 1 to some measurable set, and g is proportional to a translate of 
/. We do not know of any other cases of equality. 

Lemma 2.4 Let R > 1 and let I C R Z?e an interval of length one. Assume that p is a 
positive, integrable function on I that satisfies 

p(Xx + (1 - X)y) < R[Xp(x) + (1 - X)p(y)] for all x,y G 1,0 < A < 1. (15) 

Then for any a,b £ I with a < b, 

/ p{x)dx < - [p(a) + p(b)] . 
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Proof: We simply integrate (fl3T ) over A G [0, 1]. Since b — a < 1, then 

/ P < p(Aa + (1 - X)b)d\ <R f 1 [Xp(a) + (1 - X)p(b)] dX = R p ^ + p{ ^\ 

Ja JO Jo 2 

and the lemma is proven. □ 

The following lemma is a one-dimensional Poincare-type inequality. The proof closely 
follows the argument by Cheeger [5]. Recall that A(i) = min{|t|, t 2 }. 

Lemma 2.5 Let I C R be an interval of length one and let R > 1. Let p be a positive, 
integrable function on I with 

p(Xx + (1 - X)y) < R [Xp(x] + (1 - X)p(y)] for all x, y G 1, < A < 1. 

Then for any absolutely-continuous function / : J — > R w/f/i /jg/ = 0, 

jhU)p<\R 2 j^{f')p- (16) 

//ere, 91 consists of the two endpoints of the interval I. 

Proof: Multiplying p by a constant, we may assume that J I p = 1. Let g be an 
absolutely-continuous, non-negative function with g\gi = 0. In the first step of the proof 
we show that 

/"sf/l'l* (17) 

Denote J = <7(/) = {g(x);x G /}, an interval whose left boundary point is zero. We apply 
the change of variables y = g(x) and conclude that 



l 



g'(x)\p(x)dx= / V dy. (18) 



For any ^ y G J consider the open set I y = {x G /; > y}. When y is a regular 
non-zero value of g, the open set I y is a finite union of intervals with disjoint closures. 
According to Lemma [2~4l for any such y, 



R 

" £ 2 



J]] ^ 



The one-dimensional Sard's lemma for absolutely-continuous functions implies that almost 
any y G J is a regular value of g. Therefore, from (fT8l ) we obtain 

p(x)da; >—[ I [ p(x)dx) dy = \ [ gp 



I ' RJj \J {x;g(x)>y} J R J I 

where the last equality follows from application of Fubini's theorem. Thus (fTTT ) is proven. 
In order to prove (fTBT ). observe that for any x > and < y < 1, 

xy < /?A(x) + i^. (19) 



Indeed, (fT9l holds for x > 1 since the coefficient in front of A(x) is at least one, and (fT9l) 
may be directly proven for < x < 1 by completing a square. Let / : / — > R be an 
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absolutely-continuous function with /|gj = 0. Applying (fT71) with g = A(|/|) and using 
( fT9l ) we see that 

J i A(|/|)p < | jT A'(|/|)|/V < iJ jT |/'| min{|/|, l}p 
<i?- U^A(|/'|)p + ^min{|/| 2 ,l}p 



<i? 2 / A(|/'|)p + - / A(|/|)p 



By subtracting the right-most summand from the left-hand side, we deduce (fT6l) . 

Proof of Proposition l2.lt Since T(x) = x for x £ dl we may invoke Lemma 
conclude that 

J A (T(x) - x) f(x)dx < ^R 2 J A (T'(x) - l) f(x)dx 



< — R 2 



S T {g,f} 



/log 



III 
Iif 



□ 

and 



where we used Lemma 12.21 in the last passage. Since I is an interval of length one and 
T : I — >■ I, then for any x E I we have |T(x) — x| < 1. Consequently, for any x G /, 

A (T(x) — x) = min (|T(x) — x| 2 , |T(x) — x|} = |T(x) — x| 2 . 

This completes the proof of (fTOl ). The proposition now follows from the definition of 
Tire{g\\f). □ 



3 Induction on the dimension 

In this section we obtain higher-dimensional analogs of Proposition 12. II Suppose that /, g 
are non-negative functions defined on W 1 with a finite, positive integral. A measurable 
map T : Supp(f) — > Supp(g) is called a transportation map between f and g if for any 
measurable set A C Supp(g), 

That is, T pushes forward the probability measure whose density is proportional to /, to 
the probability measure whose density is proportional to g. Observe that the monotone 
transportation in one dimension is indeed a transportation map. Two important examples 
of transportation maps in higher dimensions are the Brenier map H and the Knothe map 
lfl2ll . It is the Knothe map which we use in the proof of the following theorem. 

Theorem 3.1 Let R > 1. Let Q C M. n be a unit cube parallel to the axes. Assume that 
f : Q — )■ (0, oo) is a Lipschitz function with 

f (Ax + (1 - X)y) < R [Xf(x) + (1 - X)f(y)} (20) 

for any < A < 1 and any x, y G Q for which x — y is proportional to one of the 
standard unit vectors in R™. Let g be a positive, integrable function on Q. Then there exists 
a transportation map T between f and g such that 

[ \T(x) -x\ 2 f(x)dx <CR 2 -Tire(g\\f), (21) 
JQ 

where C < 40/9 is a universal constant. 
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The requirement that / be a Lipschitz function should not be taken too seriously, as 
it may easily be replaced by other types of regularity assumptions. Theorem 13.11 will be 
proven by induction on the dimension, where the induction step is going to be Proposition 
12. ll in disguise. Throughout this section we use 

x = (y,r) G M n_1 x R 

as coordinates in W 1 . For a function / defined on a subset of W n and for y G M. n ~ l , we 
write 

f y (r) = f(y,r) (t G R) 

whenever (y, r) is in the domain of definition of /. Abbreviate n(y, r) = y. For a subset 
A C R n we denote ir{A) = {ir(x);x G A}. For a non-negative, integrable function / 
defined on a subset ACM", we set 



f(/)(lO = / / y (r)l {( j /jr)eA} (ir (y G tt(A)). 

The following lemma is a corollary to Proposition 12. II 

Lemma 3.2 Let R > 1. Lef Q = A x / C R" w/iere / C Iw arc interval of length one 
and A C M n_1 . Let f be a positive, Lipschitz function on Q such that 

f (Ax + (1 - X)y) < R [Xf(x) + (1 - X)f(y)} (22) 

for any < A < 1 and any x, y G Qfor which x — y is proportional to one of the standard 
unit vectors in R™. Let g be a positive function on Q, and fix z G vr(Q) such that g z is 
integrable. 

Let y G tt(Q), and let T : I — > / be the monotone transportation map between f y and 
g z . Suppose that T : Q ^ Q is a measurable map such that T(y, r) = (z, T(r))for r G I. 
Then for almost any choice ofy£ 7r(Q), 

j\T{r)-r\ 2 f y {r)dr < CR 2 J S f {g, f}(y, r)dr - S z {ir{g),ir(f)}{y) 

where C < 40/9 is a universal constant. 

Proof: According to the definitions ([5]> and ©, whenever / is differentiable at (y, r), 

S f {gJ}(y,r) = S T {g z J y }{r) - V y f(y,r) ■ (z - y) (23) 

where is the gradient in the y-variables. Thanks to our assumptions on / we may safely 
differentiate under the integral sign, thus 



V7r(/)(y) ■(z-y) = ^ (V y f(y, r) ■ (z - y)) dr (24) 
for almost any choice of y. From d23l and (1241) . 

J St{9z, f y }(r)dr = J S f {g, f}(y, r)dr + Vvr(/)(y) ■ (z - y) (25) 
for almost any choice of y. The equality (|25T ) may be reformulated as 

1 St ^^ ~ (J/y) log JJ y = l S f{9,f}(y,r)dr-S z {Tr(g),7r(f)}(y). (26) 
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We may apply Proposition 12. 1 I thanks to our assumption (l22l) and conclude that 

I I 9, 



jf |T(r) - r| 2 /„(r)dr < CR 2 jf 5 T {fc, /„} - /„) log j^r 
The lemma follows from (1261 ) and (T27T ). 



(27) 
□ 



Remark: The identity (1241 ) is the only place in the proof of Theorem [3JJ where we use 
the assumption that Q is a cube or a box, rather than, say, a parallelepiped. 

Let K C R n be a convex set. Let /, g be positive, integrable functions on K. We say 
that a map T : K ^ K transports the last coordinate monotonically if there exists a map 
F : tt(K) — > tt{K) such that for almost any y £ 7r(i^), the function o_f(j/) i s integrable and 
furthermore 

T(y,r) = (if(y),r y (r)) (28) 

for any r with (y, r) £ where T y is the monotone transportation map between the 
positive functions f y and gF( y )- 

Lemma 3.3 Let R > 1. Lef Q = 4 x / c 1" where JcKis a« interval of length one 
and A is a convex set. Assume that f is a positive, Lip schitz function on Q, and that 

f (Ax + (1 - X)y) < R [Xf(x) + (1 - X)f(y)} 

for any < A < 1 and any x,y £ Qfor which x — y is proportional to one of the standard 
unit vectors. Let g be any positive, integrable function on Q. Assume that T : Q — )• Q is a 
measurable map that transports the last coordinate monotonically. Then, 



J 

Jq 



\T y (r)-r\ 2 f(y,r)dydr < CR 2 



S T {g,f} 



Q 



7T(Q) 



S F {7r(g),n(f)} 



(29) 



where F and T y are as in rt2<gD , and C < 40/9 is a universal constant. 
Proof: According to Lemma [3^2] for almost any y £ vr(Q), 



T v(r) -r\ 2 f y (r)dr < CR 2 



S T {g,f}(y,r)dr - S F {?r(g),-K(f)}(y) 



We integrate (1301 ) over y £ vr(Q) and obtain 



(30) 
□ 



Lemma 3.4 Let R > 1. Let Q C M n &e a cwfte parallel to the axes. Assume that f : Q — > 
R is a Lipschitz function on Q, such that 

f ( Ax + (1 - X)y) < R [Xf(x) + (1 - X)f(y)] (31) 

for any < A < 1 and any x, y £ (J /or which x — y is proportional to one of the standard 
unit vectors in R n . Then also 

tt(/) (Ax + (1 - A)y) < i? [Att(/)(x) + (1 - A)vr(/)(y)] 

/or a«y < A < 1 and any x, y £ tt((5) /or which x — y is proportional to one of the 
standard unit vectors in R n_1 . 
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Proof: Fix i = 1, . . . , n — 1 and let e» be the i th standard unit vector. Condition (I3TT) 
implies that for any y £ R n_1 , ii, ?" £ R,0 < A < 1 such that (y + £iej, r) € Q and 
(y + t 2 e il r) £ Q, 

f(y + (Ati + (1 - X)t 2 ) e h r) < R [Xf (y + t x e h r) + (1 - A)/ (y + t 2 e h r)} . 
Let / be the interval for which Q = tt(Q) x /. Integrating with respect to r we have 

vr(/) (y + (At! + (1 - A)i 2 ) ei) = J f (y + (Xh + (1 - X)t 2 ) e u r) dr 

< Rj [Xf(y + t iei ,r) + (1 - A)/ (y + he^r)] dr 
= R [Xir(f)(y + txa) + (1 - A)7r(/)(y + i 2 ei)] , 
and the lemma is proven. □ 



Proof of Theorem l3.lt We need to show that there exists a transportation map with 



/ \T{x) - x\ 2 f(x)dx < CR 2 [ S{g, /} - ( [ f) log 
JQ JQ \JQ J 



Iq9 
I Q f 



Since Sx{f, <?} < S{g,f} pointwise, it suffices to prove that there there exists a trans- 
portation map T between / and g such that 



j \T(x)-x\ 2 f{x)dx<^R 2 j S T {g,f}-(j /) log 



faf 



(32) 



We will prove (l32l) by induction on the dimension. The case n = 1 is Proposition 12.11 
Assume that (1321) was proven for dimension n— 1, and let us prove it for dimension n. Thus, 
suppose that we are given a cube QcK" and functions /, g which satisfy the assumptions 
of the theorem. In view of Lemma [3~4l we may apply the induction hypothesis for 



Thus, there exists a transportation map F : tt(Q) — >• tt(Q) between vr(/) and ir(g) such 
that 



7T(Q) 



|F(y)-y| 2 7r(/)(y)^ 



(33) 



<^R 2 
~ 9 



7T(Q) 



S F Mg),n(f)} 



t(Q) 



7T(/) 



Xr(Q) "•(/) 



For y € 7r(Q) let Tj, be the monotone transportation map between f y and gT( y )> a strictly- 
increasing function which is well-defined for almost any y G tt(Q). We set 

r(y J r) = (f , ( 1 /),r i/ (r)) for(y,r)eQ. 
Then T transports the last coordinate monotonically. Hence, according to Lemma 1331 



/ \T y (r)-r\ 2 f(y,r)dydr < ^R 2 [ S T {g,f}- [ S F {n(g),7r(f)} 
J Q y JQ Jtt(Q) 



(34) 
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It is straightforward to verify that the map T is a transportation map between / and g. In 
fact, in the construction of the map T above we precisely follow the definition of the Knothe 
transportation map from 021 • By summing (l33l) and (l34l >. we conclude that 



\F(y) - y\ 2 + \T y (r) - r\ 2 f(y,r)dydr 



< — R 2 
~ 9 



40 



Q 



Q 



S T {g,f} 
S T {g,f} 



7T(/) bg 



/log 



Iq9 
laf 



All that remains is to note that when x = (y,r), 



\T(x) - x\ 2 = \F(y) - y\ 2 + \T y (r) 



(35) 



The bound (1321 ) follows from (1351 ). and the theorem is proven. 



4 Log-concavity 

Definition 1 1.4l fits very nicely with log-concave functions, as we shall see in this section. 



Lemma 4.1 Let f,g : M n — > [0, oo) be probability densities and let M > 1. Assume that 
f is log-concave and that g(x) < Mf(x)for all x G W 1 . Then, 

Tire(g\\ f) < logM. 



Proof: The function / is differentiable almost-everywhere in the convex set Supp(f) 
as it is a log-concave function. Denote / = e~^. From the convexity of tp we see that for 
any point x G Supp(f) in which / is differentiable, 

ip(x) + Vip(x) ■ (y - x) < ip(y) (y G Supp(f )). 

Consequently, denoting ip = — log g we find that for almost any x G Supp(f), 



S{g,f}(x) = sup 

yGSupp(g) 



f(x)log^-Vf(x)-(y-x) 



= f(x) sup [i/)(x) - (p(y) + Vip(x) ■ (y - x)] 

yeSupp(g) 

< f(x) sup [ip(y) - ip(y)} < /(x)logM. 

yeSupp(g) 



Consequently, 

Tire(g\\f)= [ S{g,f}<[ f (x) (log M) dx = log M. 

JSuvv(f) JSupp(f) 



□ 



Suppose that fj,i and [1% are Borel probability measures on R". The transportation cost 
between n\ and ^2 is defined to be 



Wi(m,H2) = inf / \x - y\ 2 d-i{x,y) 
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where the infimum runs over all couplings 7 of p\ and jj,2, i.e., all Borel probability mea- 
sures 7 on 1" x W 1 whose first marginal is p\ and whose second marginal is pi- See, e.g., 
Villani's book [22] for more information about the transportation metric W%. Note that the 
transportation cost enjoys convexity properties of the form 



wl 



t N \ N 

p, \vi < ^2 
\ i=i j i=i 



XiW%{p,Vi) 



(36) 



for any N > 1, Borel probability measures p, v\, . . . , upf on M. n , and non-negative Aj's that 
sum to one. Next, for a measure p and measurable sets A, B with < p(A) < 00 define 



iAa{B) 



p(Ac\B) 



Thus the probability measure p\^ is the conditioning of \i to the set A. 

Lemma 4.2 Let Q be a measurable space £1 and fix a reference probabUity measure \x 
on 11 Denote by V the convex set of all probabUity measures on 17 that are absolutely 
continuous with respect to ji. Then for any v £ V, denoting f = dv/dfi, 



j 



N 



/(log/)d/i>inf ^Ailog 



dvj 
dfj, 



N 



Y\iVi,N> 1, Vi G V, X{ >0 



Proof: Consider first the case where / is a simple function (i.e., its range is a finite 



set). Then the measure u takes the form v = YliLi where Vi = with disjoint sets 



Ai, . . . , An C O and non-negative Aj's. Therefore, 

N 



N 1 TV 

/ f (log f)dii = 5Z Ailog_ TTT = Yl Xil °- 
Jn i=1 ^i) . =1 



dvi 



dp, 



and the conclusion of the lemma holds true in this case. The passage from a simple function 
to an arbitrary probability density is done via a routine approximation argument. □ 



It is not very difficult to see that the inequality in Lemma |4!21 is in fact an equality. The 
following theorem reminds us of Talagrand's transportation-cost inequalities for product 
measures from [20]. 

Theorem 4.3 Let R > 1. Let Q C M. n be a unit cube parallel to the axes. Assume that p 
is a probability measure on Q with a log-concave density f. Assume that 

f ( Ax + (1 - A)y) < R [Xf(x) + (1 - X)f(y)] (37) 

for any < A < 1 and any x,y £ Qfor which x — y is proportional to one of the standard 
unit vectors in R n . Let v be a probability measure on Q that is absolutely continuous with 
respect to p. Then, 

Wi(p,u) < CR 2 D(v\\p) 

where D (v\\p) = J g (log g)dp for g = dv/dp, the usual relative entropy functional, and 
where C < 40/9 is a universal constant. 
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Proof: By a standard approximation argument (e.g., convolve \x with a tiny gaussian 
and restrict to the cube Q), we may assume that / is C 1 -smooth up to the boundary in Q, 
and in particular it is a Lipschitz function. According to Theorem 13. H and Lemma |4~T1 for 
any probability measure v on Q which is absolutely continuous with respect to \i, 



W 2 {{jl,v)<CR 2 log 



dv 



(38) 



The important observation is that W% (/•*, v) is convex in u, in view of d36l > above. Hence 
we may look at decompositions v = YliLi ^i v i an ^ to optimize the right-hand side of 
(1381 ) over such decompositions. Lemma |4~2~1 completes the proof. □ 

Transportation-cost inequalities such as Theorem 14. 3 1 are the subject of the comprehen- 
sive survey by Gozlan and Leonard [8]. The fact that transportation-cost inequalities imply 
concentration inequalities goes back to Marton Ifl4l . The following proof reproduces her 
argument, and is included here for completeness. 

Proof of Theorem IO Denote E = Q\(A + tB%). If fi(E) = then there is nothing 
to prove. Otherwise, we apply Theorem 14. 3 1 for the measure v = h\e- Thus there exists a 
coupling 7 of \i and [i\e with 

/ \y-x\ 2 d 7 (x,y) < ^R 2 D(v\\v) = ^F 2 -log-J— . 
JqxE 9 9 fi(E) 

According to the Markov-Chebyshev inequality, there exists a subset F C Q x E with 
7(F) > 41/81 such that for any (x, y) £ F, 



\y-x\ 2 < 9R 2 lo£ 



KEY 



(39) 



Since 7 is a coupling and [a(A) > 1/2 with 7(F) > 41/81, there exists (x,y) £ F with 
x £ A. For such (x, y), 



x G A,y e F 



and 



\x — y\ < 3R ■ 4 /log 



M (F) 



where we used (l39l . However, all points in F are of distance at least t from all points of A. 
Consequently, 



t<3R- J lot 



1 



/i(F) 



Therefore //(F) < exp(-t 2 /" 2 ) for a = 3R and fi(A + iS^) > 1 - exp(-t 2 /a 2 ), as 
required. □ 

Proof of Theorem ll.lt Let T > 0. Observe that the validity of both the assumptions 
and the conclusions of the theorem is not altered under the scaling 

£ ^ T£, M h-> T~ 2 M. 

We may thus normalize so that 1 = 1. All that remains is to verify that the assumptions 
of Theorem 11.21 are satisfied with R = e M / 8 . Fix i = 1, . . . ,n and x £ Q and denote 
h(t) = ip(x + tei). Then h is well-defined on a certain interval I C R of length one, and 
our goal is to show that for any a,b £ I and < A < 1, 



-h(A*+(l-A)6) < e M/8 Ae -fc(a) + (1 _ A)g - 



h(b) 



(40) 
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In view of the arithmetic/geometric means inequality, the desired inequality (l40l) would 
follow once we establish that 

- h(Xa + (1 - X)b) < M/8 - Xh(a) - (1 - X)h(b). (41) 

In order to prove (14TI) . we use our assumption that h"(t) < M in the interval /. According 
to the Taylor theorem, for any x,y € I, 

h(y) -h{x)- h'(x) (y - x) < M ^ ~ V ^ . (42) 

We will apply inequality (1421 for y = a,b and x = Xa + (1 — X)b, then add the resulting 
inequalities with coefficients A and 1 — A. This yields 

Xh(a) + (1 - X)h(b) - h{Xa + (1 - X)b) < M A ( 1 ~ A K fe ~ a ) 2 < M (43) 

2 8 

as A(l - A) < 1/4 and \b — a\ < 1. The inequality d4T|) follows from (03]). □ 



It is well-known (see, e.g., V. Milman and Schechtman lfT7l Section 2 and Appendix 
V]) that Theorem 11.11 implies a concentration inequality for Lipschitz functions as follows: 

Corollary 4.4 Let Q, /i, a be as in Theorem \l.l\ (or as in Theorem \1.2\) . Let f : Q — > R be 

a 1-Lipschitz, function, i.e., \f(x) — f(y)\ < \x — y\for any x,y £ Q. Denote E = Jq fd\x. 
Then, for any t > 0, 

fi {x e Q ; \f(x) -E\>t}< Ce~ ct2/a2 
where c, C > are universal constants. 

In particular, we deduce from Corollary I4.4l that in the notation of Theorem ll.il 

Cov(n) < Ca 2 ■ Id (44) 

in the sense of symmetric matrices, where Cov(fi) is the covariance matrix of the probabil- 
ity measure \l and C > is a universal constant. 

Remark 4.5 Regarding the dependence of a(£, M) on M in Theorem ll.il Let Xq, . . . , X n 
be independent standard Gaussian random variables. Consider the random vector 

Y _ ■ ■ ■ ,X n ) (X , . . . , Xq) 
lOOVlogn 100\/log n ' 

and let Z be the conditioning of Y to the cube Q = [—1/2, 1/2] n . Denote by [i the 
distribution of Z, a probability measure on Q. It is not too difficult to verify that [i satisfies 
the requirements of Theorem ll.il with 1 = 1 and M = C log n. Set A = {x G Q; %i < 
0}. Then fJ,(A) = 1/2. However, one may compute that for any t < cn 1 / 2 / \/Iog n, 

n(A + tB%) < 2/3. 

This shows that a(l, Clogn) > cn 1 ' 2 / y/logn. Therefore the exponential dependence 
of the dimension-free expression a(£, M) on £ 2 M is inevitable. A simple variant of this 
example shows that it is also impossible to replace the cube Q of sidelength I in Theorem 
11.11 by a Euclidean ball of radius £\fn. For another example in which the cube behaves 
better than the Euclidean ball, see ifTTl Corollary 3]. 
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It was explained by E. Milman lfl6l that in the log-concave case, Gaussian concentration 
inequalities, quadratic transportation-cost inequalities, and log-Sobolev inequalities are all 
essentially equivalent up to universal constants. In particular, by using the results of Otto 
and Villani lfT8l Corollary 3.1], we deduce from Theorem 14.31 the following log-Sobolev 
and Poincare inequalities: 

Corollary 4.6 Let £, M, Q, p, be as in Theorem \l.l\ (or as in Theorem U .21 with I = 1 and 
R = e M / s ). Then, for any locally-Lipschitz function f : Q — > K with J Q f 2 dp = 1, 



Here, C > is a universal constant. 

It is conceivable that Theorem 1 1.11 and Corollary 14. 61 will turn out to be relevant to the 
analysis of lattice models in physics. For instance, one may suggest an Ising model with 
bounded, real spins as in Royer [19, Section 4.2] in which the assumptions of Theorem U. II 
are satisfied. Essentially, we require that the spins lie in the interval [—1,1], that the entire 
Hamiltonian is convex (just convex, not strictly-convex) and that the second derivatives 
of the pairwise potentials and the self-interactions are bounded. Perhaps the logarithmic 
Sobolev inequality of Corollary 14.61 may be used to demonstrate the uniqueness of the 
Gibbs measure and the lack of phase transition in such a model. 
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